智能论文笔记

Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation

Gyan Tatiya , Jonathan Francis , Luca Bondi , Ingrid Navarro , Eric Nyberg , Jivko Sinapov , Jean Oh

分类：机器人 | 人工智能 | 计算机视觉

2022-12-21

Generalisation to unseen contexts remains a challenge for embodied navigation agents. In the context of semantic audio-visual navigation (SAVi) tasks, the notion of generalisation should include both generalising to unseen indoor visual scenes as well as generalising to unheard sounding objects. However, previous SAVi task definitions do not include evaluation conditions on truly novel sounding objects, resorting instead to evaluating agents on unheard sound clips of known objects; meanwhile, previous SAVi methods do not include explicit mechanisms for incorporating domain knowledge about object and region semantics. These weaknesses limit the development and assessment of models' abilities to generalise their learned experience. In this work, we introduce the use of knowledge-driven scene priors in the semantic audio-visual embodied navigation task: we combine semantic information from our novel knowledge graph that encodes object-region relations, spatial knowledge from dual Graph Encoder Networks, and background knowledge from a series of pre-training tasks -- all within a reinforcement learning framework for audio-visual navigation. We also define a new audio-visual navigation sub-task, where agents are evaluated on novel sounding objects, as opposed to unheard clips of known objects. We show improvements over strong baselines in generalisation to unseen regions and novel sounding objects, within the Habitat-Matterport3D simulation environment, under the SoundSpaces task.

translated by 谷歌翻译

当与物体交互并发现其内在特性时，人类利用多种传感器方式。仅使用视觉模式就不足以在对象属性背后得出直觉（例如，两个盒子中的哪个更重），因此也必须考虑非视态模态，例如触觉和听觉。尽管机器人可以利用各种方式通过与物体（例如，抓握，举重和摇动行为）通过学习的探索性互动来获得对象财产理解，但仍然存在挑战：一个机器人通过对象探索获得的隐性知识不能直接由另一个机器人直接使用不同的形态，因为传感器模型，观察到的数据分布和交互功能在这些不同的机器人配置中不同。为了避免从头开始学习交互式对象感知任务的昂贵过程，我们为每个新机器人提出了一个多阶段投影框架，用于传输跨异构机器人形态的对象属性的隐式知识。我们使用包含两个执行7,600个对象相互作用的异质机器人的数据集评估了对象范围识别和对象身份识别任务的方法。结果表明，知识可以跨机器人传输，因此新部署的机器人可以在无需详尽探索所有对象的情况下引导其识别模型引导其识别模型。我们还提出了一种数据增强技术，并表明该技术改善了模型的概括。我们在此处发布代码和数据集：https：//github.com/gtatiya/implitic-knowledge-transfer。

translated by 谷歌翻译